Concretely Annotated Corpora
نویسندگان
چکیده
In either setting, it is common for a research group to generate bulk annotations over a preferred corpus internally, using their own tools, programming languages and formats, but then reporting on this as merely an engineering pre-processing step not worth describing in significant detail. Worse, these annotated collections are often not available to the rest of the community, making it difficult to perform apples-to-apples comparison of the “real research”.
منابع مشابه
Towards a (Better) Definition of the Description of Annotated MIR Corpora
Today, annotated MIR corpora are provided by various research labs or companies, each one using its own annotation methodology, concept definitions, and formats. This is not an issue as such. However, the lack of descriptions of the methodology used—how the corpus was actually annotated, and by whom—and of the annotated concepts, i.e. what is actually described, is a problem with respect to the...
متن کاملAn Annotated Corpus Management Tool: ChaKi
Large scale annotated corpora are very important not only in linguistic research but also in practical natural language processing tasks since a number of practical tools such as Part-of-speech (POS) taggers and syntactic parsers are now corpus-based or machine learningbased systems which require some amount of accurately annotated corpora. This article presents an annotated corpus management t...
متن کاملFeasibility of pooling annotated corpora for clinical concept extraction
Availability of annotated corpora has facilitated application of machine learning algorithms to concept extraction from clinical notes. However, it is expensive to prepare annotated corpora in individual institutions, and pooling of annotated corpora from other institutions is a potential solution. In this paper we investigate whether pooling of corpora from two different sources, can improve p...
متن کاملWebBANC: Building Semantically-Rich Annotated Corpora from Web User Annotations of Minority Languages
Annotated corpora are sets of structured text used to enable Natural Language Processing (NLP) tasks. Annotations may include tagged parts-of-speech, semantic concepts assigned to phrases, or semantic relationships between these concepts in text. Building annotated corpora is labor-intensive and presents a major obstacle to advancing machine translators, named entity recognizers (NER), part-ofs...
متن کاملSharing Network Parameters for Crosslingual Named Entity Recognition
Most state of the art approaches for Named Entity Recognition rely on hand crafted features and annotated corpora. Recently Neural network based models have been proposed which do not require handcrafted features but still require annotated corpora. However, such annotated corpora may not be available for many languages. In this paper, we propose a neural network based model which allows sharin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014